Skip to content

feat: add NAESatisfiability → SetSplitting reduction (#841) [full pipeline v2]#985

Closed
zazabap wants to merge 3 commits intomainfrom
feat/841-full-pipeline
Closed

feat: add NAESatisfiability → SetSplitting reduction (#841) [full pipeline v2]#985
zazabap wants to merge 3 commits intomainfrom
feat/841-full-pipeline

Conversation

@zazabap
Copy link
Copy Markdown
Collaborator

@zazabap zazabap commented Apr 1, 2026

Summary

Implements NAESatisfiability → SetSplitting using the updated /verify-reduction/add-reduction pipeline. Compare with PR #984 (old pipeline) to see the differences.

What's new vs PR #984

Aspect PR #984 (old) This PR (new)
Type gate Not checked OrOr verified in Step 1
Test vectors JSON Not exported Exported, used for Rust test generation
Canonical example in example_db Missing (#974 Check 9) Included in rule_builders.rs
Example-db lookup test Missing (#974 Check 10) Included in example_db.rs
Verification artifact cleanup Artifacts left in repo Cleaned up — Python/Typst/PDF removed
Tests 5 tests 7 tests (added all-assignments round-trip, larger instance)

Commit structure

  1. 168f6be — verification artifacts (Typst proof + Python scripts from PR docs: verify reduction #841 — NAESatisfiability → SetSplitting #980)
  2. 17e9bca — Rust implementation + example_db + tests + artifact cleanup

The second commit shows the clean pattern: -6942 lines (artifacts removed) + 324 lines (Rust added).

Pipeline flow demonstrated

Issue #841 → /verify-reduction (type gate: Or→Or ✓)
           → Typst proof + constructor (32K checks) + adversary (46K checks)
           → test vectors JSON exported
           → /add-reduction (Rust from verified Python spec)
             → Step 1: reduce_to() + extract_solution()
             → Step 3: 7 tests from test vectors
             → Step 4: canonical example in rule_builders.rs
             → Step 4b: example-db lookup test
             → Step 7: verification artifacts cleaned up

Test plan

  • cargo test naesatisfiability_to_setsplitting — 7 tests pass
  • cargo clippy — clean
  • Canonical example registered and passing coverage gate
  • Verification artifacts removed from docs/paper/verify-reductions/

Closes #841

🤖 Generated with Claude Code

zazabap and others added 2 commits April 1, 2026 06:38
Typst: Construction (3 steps) + Correctness (⟹ + ⟸) + Extraction + Overhead + YES/NO examples
Constructor: 32,513 checks, 0 failures (exhaustive n ≤ 5, 7 sections)
Adversary: 46,774 checks, 0 failures (independent impl + hypothesis)
Cross-comparison: 615 instances, all agree

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Implemented via /verify-reduction -> /add-reduction pipeline.
Includes: reduction rule, 7 tests, canonical example in example_db.
Verification artifacts cleaned up (79K+ Python checks preceded this Rust impl).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.04%. Comparing base (423506c) to head (bd516ef).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff            @@
##             main     #985    +/-   ##
========================================
  Coverage   98.03%   98.04%            
========================================
  Files         784      786     +2     
  Lines       82310    82461   +151     
========================================
+ Hits        80695    80846   +151     
  Misses       1615     1615            

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

zazabap added a commit that referenced this pull request Apr 1, 2026
Root cause: PRs #985 and #991 failed CI because:
1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop)
2. New reductions can create paths that dominate existing direct reductions
   (test_find_dominated_rules_returns_known_set has hardcoded known set)

Added to Step 6:
- Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`)
- Mandatory `cargo test` (full suite, not filtered)
- Explicit dominated-rules gate with fix instructions

Added to Common Mistakes:
- clippy without -D warnings
- dominated rules test
- skipping full cargo test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Resolves clippy -D warnings failure on Rust 1.94 CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@zazabap zazabap closed this Apr 4, 2026
GiggleLiu added a commit that referenced this pull request Apr 6, 2026
* docs: add design spec for proposed reductions Typst note

Covers 9 reductions: 2 NP-hardness chain extensions (#973, #198),
4 Tier 1a blocked issues (#379, #380, #888, #822), and
3 Tier 1b blocked issues (#892, #894, #890).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: add verify-reduction skill for mathematical verification of reductions

New skill: /verify-reduction <issue-number>

End-to-end pipeline that takes a reduction rule issue and produces:
1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples)
2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5)
3. Lean 4 lemmas (non-trivial structural proofs required)

Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR.

Strict quality gates (zero tolerance):
- No "trivial" category — every reduction ≥5000 checks
- 7 mandatory Python sections including NO (infeasible) example
- Non-trivial Lean required (rfl/omega tautologies rejected)
- Zero hand-waving in Typst ("clearly", "obviously" → rejected)
- Mandatory gap analysis: every proof claim must have a test
- Self-review checklist with 20+ items across 4 categories

Developed and validated through PR #975 (800K+ checks, 3 bugs caught)
and tested on issues #868 (caught wrong example) and #841 (35K checks).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* style: add YAML frontmatter + professional tone to verify-reduction skill

- Added frontmatter (name, description) matching other skills' convention
- Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP",
  "NON-NEGOTIABLE") to professional but firm language
- All quality gates unchanged — same strictness, better presentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: adversarial multi-agent verification in verify-reduction skill

Replaces Lean-required gates with adversarial second agent:

- Step 5: Adversary agent independently implements reduce() and
  extract_solution() from theorem statement only (not constructor's script)
- Step 5c: Cross-comparison of both implementations on 1000 instances
- Lean downgraded from required to optional
- hypothesis property-based testing for n up to 50
- Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison

Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md):
- Same agent writing proof + test is the #1 risk for AI verification
- Two independent implementations agreeing > one + trivial Lean
- Lean caught 0 bugs in PR #975; Python caught all 4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: design spec for verify-reduction skill enhancements

Typst↔Python auto-matching, test vectors JSON for downstream
consumption by add-rule and review-pipeline, adversary tailoring
by reduction type, compositional verification via pred CLI.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs: implementation plan for verify-reduction enhancements

5 tasks: update verify-reduction (Step 4.5 auto-matching, Step 5 typed
adversary, Step 8 downstream artifacts), create add-reduction skill,
register in CLAUDE.md.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: enhance verify-reduction with test vectors export, typed adversary, pipeline integration

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: create add-reduction skill — consumes verified artifacts from verify-reduction

* feat: register add-reduction skill in CLAUDE.md, update verify-reduction description

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: three improvements to verify-reduction and add-reduction skills

1. verify-reduction Step 1: type compatibility gate — checks source/target
   Value types before proceeding. Stops and comments on issue if types are
   incompatible (e.g., optimization → decision needs K parameter).

2. add-reduction Step 7: mandatory cleanup of verification artifacts from
   docs/paper/verify-reductions/ — Python scripts, JSON, Typst, PDF must
   not get into the library.

3. add-reduction Steps 4/4b/5: mandatory requirements from #974 —
   canonical example in rule_builders.rs (Check 9), example-db lookup
   test (Check 10), paper reduction-rule entry (Check 11).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: concise verify-reduction (761→124 lines) + self-contained add-reduction

verify-reduction: removed verbose templates, condensed checklists into
prose, kept all requirements but removed boilerplate code blocks that
the agent can derive from context.

add-reduction: integrated add-rule Steps 1-6, write-rule-in-paper
Steps 1-6, and #974 requirements (Checks 9/10/11) into a single
self-contained skill. No need to read 3 other skills.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: restore structural requirements in verify-reduction (124→274 lines)

The previous rewrite over-condensed the skill, removing gates that agents
need to follow: 7-section descriptions with table, minimum check count
table, check count audit template, gap analysis format, common mistakes
table, and self-review checklist with checkboxes.

Restored: all structural gates and requirements.
Kept concise: no verbose Python/Typst code templates (agent derives these).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: harden add-reduction with file-level verification gates

Steps 4, 4b, 5 now have HARD GATE labels with verification commands
that check the SPECIFIC required files appear in `git diff --name-only`.
Step 8 has a pre-commit gate that lists all 6 required files and blocks
commit if any is missing.

Root cause: subagents skipped Steps 4 (put example in rule file instead
of rule_builders.rs) and 5 (skipped paper entry entirely) because the
skill said "MANDATORY" but had no mechanical enforcement.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add CI-equivalent checks to add-reduction pre-commit gate

Root cause: PRs #985 and #991 failed CI because:
1. Local clippy doesn't use -D warnings but CI does (caught needless_range_loop)
2. New reductions can create paths that dominate existing direct reductions
   (test_find_dominated_rules_returns_known_set has hardcoded known set)

Added to Step 6:
- Mandatory `cargo clippy -- -D warnings` (not just `cargo clippy`)
- Mandatory `cargo test` (full suite, not filtered)
- Explicit dominated-rules gate with fix instructions

Added to Common Mistakes:
- clippy without -D warnings
- dominated rules test
- skipping full cargo test

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: correct add-reduction HARD GATE for canonical examples

rule_builders.rs is a 4-line pass-through — canonical examples are
registered via canonical_rule_example_specs() in each rule file,
wired through mod.rs. Updated Step 4 to match actual architecture.

Also added analysis.rs to git add list (for dominated-rules updates).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat: parent-side verification + pre-commit hook for add-reduction

Two enforcement mechanisms that don't rely on subagent compliance:

1. Parent-side verification (Step 8a): After subagent reports DONE,
   the parent runs file gate checks independently. If any required
   file is missing, sends the subagent back — doesn't trust self-report.

2. Pre-commit hook (.claude/hooks/add-reduction-precommit.sh):
   Mechanically blocks commits of new rule files unless example_db.rs,
   reductions.typ, and mod.rs are also staged. Subagents cannot bypass.

Root cause: subagents skip HARD GATE steps despite skill text saying
"MANDATORY". Text-based enforcement doesn't work — need mechanical
checks that run after the subagent, not instructions the subagent reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: strengthen type compatibility gate in verify-reduction skill

Expanded the type compatibility table to explicitly list all incompatible
pairs (Min→Or, Max→Or, Max→Min, Min→Max, Or→Sum, etc.) with concrete
examples from batch verification (#198 MVC→HamCircuit, #890 MaxCut→OLA).

Added common mistake entry for proceeding past the type gate.

Learned from batch run: 5 out of 50 reductions were mathematically
verified but turned out to be unimplementable as ReduceTo due to
type mismatches that the original gate didn't catch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Delete docs/superpowers/plans/2026-04-01-verify-reduction-enhancements.md

* Delete docs/superpowers/specs/2026-03-31-proposed-reductions-note-design.md

* Delete .claude/CLAUDE.md

* Revert "Delete .claude/CLAUDE.md"

This reverts commit 71c1444.

* chore: remove docs/superpowers/specs/ directory

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* refactor: integrate verify-reduction into add-rule pipeline

- Delete /add-reduction skill and pre-commit hook (absorbed into /add-rule)
- /add-rule now runs /verify-reduction by default; --no-verify to skip
- /verify-reduction simplified: no worktree, no PR, no saved artifacts
- /issue-to-pr passes --no-verify flag through to /add-rule
- Update CLAUDE.md skill descriptions

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Jinguo Liu <cacate0129@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Rule] NOT-ALL-EQUAL 3SAT to SET SPLITTING

1 participant